Learning Document-Level Semantic Properties from Free-Text Annotations
نویسندگان
چکیده
This paper demonstrates a new method for leveraging free-text annotations to infer semantic properties of documents. Free-text annotations are becoming increasingly abundant, due to the recent dramatic growth in semistructured, user-generated online content. An example of such content is product reviews, which are often annotated by their authors with pros/cons keyphrases such as “a real bargain” or “good value.” To exploit such noisy annotations, we simultaneously find a hidden paraphrase structure of the keyphrases, a model of the document texts, and the underlying semantic properties that link the two. This allows us to predict properties of unannotated documents. Our approach is implemented as a hierarchical Bayesian model with joint inference, which increases the robustness of the keyphrase clustering and encourages the document model to correlate with semantically meaningful properties. We perform several evaluations of our model, and find that it substantially outperforms alternative approaches.
منابع مشابه
Learning Document-Level Semantic Properties from Free-text Annotations
This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often annotated by their authors with free-text keyphrases, such as “a real bargain” or “good value.” We leverage these unstructured annotations by clustering them into semantic properties, and then tying the induced clusters to...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSemantator: A Semi-automatic Semantic Annotation Tool for Clinical Narratives
In this paper, we introduce Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, users can annotate document fragments with classes in the ontology to create instances and relate created instances with ontology properties. Also, Semantator enables automatic annotation by connecting to the NCBO annotator and the...
متن کاملSemantic Web-Based Document: Editing and Browsing in AktiveDoc
This paper presents a tool for supporting sharing and reuse of knowledge in document creation (writing) and use (reading). Semantic Web technologies are used to support the production of ontology based annotations while the document is written. Free text annotations (comments) can be added to integrate the knowledge in the document. In addition the tool uses external services (e.g. a Semantic W...
متن کاملSteps Towards Semantically Annotated Language Resources
The use of textual resources such as text corpora, tree banks, large-scale lexica etc., has become a widely accepted commitment in the field of computational linguistics. However the scope of the annotations proposed has been unbalanced towards the ’surface’ level. Only recently corpora with a deeper level of annotations have started to emerge. In this paper we describe a machine learning appro...
متن کامل